Processing Data Streams with the RapidMiner Streams Plugin
نویسندگان
چکیده
In various applications we face a plethora of data that is often growing continuously. Such data arize in monitoring settings such as server log files, manufacturing processes, sensor networks or high volume news feeds such as twitter. Analysis of such data is different to the traditional batch setting that RapidMiner initially has been designed for. In this work we present the streams library – a simple and easy to use framework to continuously process streaming data. It comes with the Streams Plugin, integrating its streaming capabilities into the RapidMiner suite. We give an overview of the architecture of the streams library and its RapidMiner integration and demonstrate its usefulness for processing very large and continuous data in several use cases.
منابع مشابه
Implementing Hierarchical Heavy Hitters in RapidMiner: Solutions and Open Questions
Huge masses of data and potentially infinite data streams pose big challenges to methods in data mining that analyse data off-line and in several passes. In the area of intrusion detection, algorithms that detect characteristical patterns in system call data could have to process several hundred megabytes of data per minute. We describe a plugin for the aggregation of data streams by determinin...
متن کاملRobust GPGPU plugin development for RapidMiner
In recent years, significant number of papers [1][2] have been published about general-purpose graphical processing unit (GPGPU ) programs which are able to accelerate computationally intensive applications by several times over conventional CPU programs. These papers raise an important question: With the current developer tools is it possible to integrate these GPU programs into a major indust...
متن کاملPiPo, a Plugin Interface for Afferent Data Stream Processing Operators
We present PiPo, a plugin API for data stream processing with applications in interactive audio processing and music information retrieval as well as potentially other domains of signal processing. The development of the API has been motivated by our recurrent need to use a set of signal processing modules that extract low-level descriptors from audio and motion data streams in the context of d...
متن کاملAn Information Extraction Plugin for RapidMiner 5
In this paper we are presenting a RapidMiner 5 Information Extraction Plugin which allows the use of information extraction (IE) techniques within the open source datamining software RapidMiner [1]. The plugin can be seen as an interface between natural language and IEor datamining-methods, because it converts documents containing natural language texts into machine-readable form – preserving t...
متن کاملTechnische Universität Dortmund Subproject A1 Data Mining for Ubiquitous System Software Information Extraction in Rapidminer
This paper describes the Information Extraction Plugin 1 [3] which allows the use of Information Extraction mechanisms in RapidMiner 2 .
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012